An Efficient Nonnegative Matrix Factorization & Game Theoretic Framework Based Data Clustering

نویسنده

  • Dr. V. Umadevi
چکیده

This Mostly, factorization of matrices is not unique, Non-negative Matrix Factorization (NMF) changes from the Principal Component Analysis, Singular Value Decomposition, Nystrom Method, and it imposes the controls that the factors must be non-negative. The proposed method utilizes a most powerful tool derivative from evolutionary game theory, which permits re-organizing the clustering attained with NMF method, making it consistent with the structure of the data set. The new propose a method to filter the clustering results obtained with the nonnegative matrix factorization (NMF) technique, imposing consistency constraints on the final labeling of the data set. The research community focused its effort on the initialization and on the optimization part of this method, without paying concentration to the final cluster assignments. The propose a game theoretic framework in which each object to be clustered is symbolized as a player, which has to choose its cluster membership. The detailed obtained with NMF method is used to initialize the approach space of the players and a weighted graph is worn to model the interactions among all the players. These connections allow the players to choose a cluster which is coherent with the clusters chosen by similar players, a property which is not guaranteed by NMF, since it produces a soft clustering of the data. The proposed results on common benchmarks show that our model is able to progress the performances of many NMF formulations.. Keywords— Enter Data Cluster; Nonnegative Matrix Factorization; Weighted Graph ; Game Theoretic. I.INTRODUCTION This Clustering has received a significant amount of attention as an important problem with many applications, and a number of different algorithms have emerged over the years. Recently, the use of Non-Negative Matrix Factorization (NMF) for partitioned clustering has attracted much interest. The popularity of NMF has significantly increased; the authors proposed multiplicative NMF algorithms which they applied to image data. At present, NMF and its variants have already found a wide spectrum of applications in several areas such as pattern recognition and feature extraction dimensionality reduction, segmentation and clustering, text mining and neurobiology. The concept of matrix factorization is used in a wide range of important applications and each matrix factorization relies on an assumption about its components and its underlying structures, it is an essential process in each application domain. Very often, the data sets to be analysed are non-negative, and sometimes they also have a sparse representation. In machine learning approach, sparseness is strongly related to feature selection and certain generalizations in learning algorithms, while non-negativity relates to possibility distributions. Clustering [6] is an unsupervised learning mechanism which locates unknown groups of related dataset. In the areas of information retrieval, bioinformatics, and digital image processing, it has been a very imperative problem where many algorithms have been residential using various objective functions. K-means clustering approach is a well-known scheme that tries to minimize the sum of squared distances between own cluster center and each data point. K-means has been generally applied thanks to its comparative simplicity. However, it is well known that K-means is flat to discover only a local minimum values and, therefore, strongly depends upon K-means initial conditions. A common approach in this condition is to run Kmeans with many different initial conditions and choose the best solution. Modifications to algorithms were also created by refining initial guesses or bountiful variations at the expense of more processing overhead time. The reformulated the minimization problem as a hint maximization problem and suggested an algorithm using a matrix factorization. Regarding the same objective function, the sum of squared errors, recently proposed a method called Affinity Propagation that performs clustering by passing messages between data points. Clustering, which partitions a data set into different groups unsupervised, is one of the most essential topic in statistical learning approach. Most International Journal of Computer Trends and Technology (IJCTT) – Volume 49 Number 1 July 2017 ISSN: 2231-2803 http://www.ijcttjournal.org Page 52 established clustering mechanisms are designed for one-side data clustering, i.e. cluster either data points or features. Still, in many real-world applications, the clustering based analysis is interested in two side data clustering results, i.e. group the data points and features concurrently, e.g., ―words‖ and ―documents‖ in document analysis, ―items‖ and ―users‖ in collaborative filtering technique, ―genes‖ and ―samples‖ in microarray data analysis, etc. Classically, instead of being independent, the different clustering tasks on dataset and features are closely connected, and it is challenging for traditional clustering algorithms to exploit the data and features interdependence more efficiently. Consequently, coculturing procedures, which aim to cluster both features and data simultaneously by limit the interrelations between them, have been proposed in recent researches [7]. However, the techniques are mentioned above focus on one-side data clustering, i.e. clustering the data side based on the related along the aspect side. Motivated by the duality between features (e.g. documents) and data points (e.g. words), i.e. data points can be group based on their sharing on features, while features can be group based on their sharing on the dataset points, several co-clustering techniques have been projected in the past decade and shown to be superior to conventional one-side data clustering. For instance, proposed a bipartite spectral graph partition method to co-cluster documents and words. Still, it requires that each document cluster is related with a word cluster, which is a very strong limit. The existing algorithm an information theoretic co-clustering algorithm, which can be seen as the conservatory of information blockage method to two-side clustering. The existing projected an orthogonal nonnegative matrix tri-factorization (ONMTF) to co-cluster documents and words, which owns an stylish mathematical form and heartening performance. Fig 1:Proposed NMF method II.RELATED WORK The document clustering, a typical process is Latent Semantic Analysis (LSA) which involves a Singular Value Decomposition (SVD) of the document-term matrix. The explore the relationship between NMF and Probabilistic Latent Semantic Analysis (PLSA), ultimate that the hybrid connections among them give the best results. The established links between PLSA and NMF, and they claim that PLSA solves NMF with KL I-divergence, and for this cost purpose, PLSA provides a better reliability. A comparison of several NMF techniques with various databases was performed. They completed that the NMF techniques generally give better performance than k-means. In fact, the NMF method is rather equivalent to soft PLSA, and kmeans typically also gives the equivalent results as NMF. More lately, explore the relationships between K-means/spectral clustering and Nonnegative Matrix Factorization (NMF), and proposed method to use Nonnegative Matrix Trifactorization (NMTF) to cocluster words and documents at the same time. Due to its mathematical elegance and hopeful experiential results, NMTF method has been further residential to address diverse aspects of coclustering. However, a notorious bottleneck of NMTF based co-clustering techniques is the measured computational speed because of intensive matrix multiplications concerned in each iteration step of the resolution algorithms, which makes these techniques tough to be apply to large scale data in actual world applications [14]. Bi-clustering (co-clustering) of gene expression data set and advocated the import of such concurrent clustering of genes and conditions for learning more coherent and significant data clusters. They formulated their problem of bi-clustering by proposing a mean squared residue score for measuring cluster quality. One of the initial biclustering formulations, block clustering was introduced by Hartigan who called it direct clustering. He proposed various bi-clustering value measures and models including the partition techniques are old in this paper. However, only gives a greedy technique for a hierarchical co-clustering model. This method begins with the entire data set in a single block and then at each stage find the column or row divide of every block into two pieces, choosing the one that produces largest reduction in the total within block variance. The splitting is continued till the reduction of within block discrepancy due to additional splitting is less than a known threshold [4]. The existing proposed non-overlapping and overlapping two-mode partitioning technique, of which the non-overlapping two-mode technique attempting to minimize the same objective function bi-clustering method for gene expression data set using mean squared scum as the measure of the rationality of the conditions and genes. The algorithm constructs one bi-cluster at a time a low mean squared remains plus a large variation from the steady gives a good criterion for spotting a bicluster. A sequence of node (i.e. column or row) removals and additions is practical to the gene condition matrix, while the mean squared remains of International Journal of Computer Trends and Technology (IJCTT) – Volume 49 Number 1 July 2017 ISSN: 2231-2803 http://www.ijcttjournal.org Page 53 the bi-cluster is kept beneath a agreed threshold. They nearby an algorithm called FLOC (FLexible Overlapped bi-clustering) that concurrently produces k bi-clusters whose mean remains are all less than a pre-defined constant r. FLOC incrementally shifts a column or row out of or into a bi-cluster depending on whether the column or row is already included in that bi-cluster or not, which is called an exploit [20]. The spectral bi-clustering approaches similar to the one proposed method on gene expression data set to produce checkerboard based structure. The largest several right and left singular vectors of the normalized gene expression data set matrix are subtracted and then a final grouping step using kmeans and regularized cuts is practical to the data predictable to the topmost curious vectors. Different normalizations of conditions and genes. The information-theoretic bi-clustering algorithms that sights a non-negative matrix as an experiential joint probability sharing of two discrete random variables and shams the bi-clustering problem as an optimization problem in sequence theory: the optimal bi-clustering maximizes the joint in sequence between the clustered random variables topic to constraints on the number of column and row clusters [5]. Principal component analysis (PCA) is a generally used statistical technique for unsupervised learning dimension reduction. K-means clustering method is a mostly used data bi-clustering for unsupervised learning tasks. Here we show that principal components (PCA) are the incessant solutions to the separate cluster membership pointers for K-means clustering approach. To address this main problem, in this proposed paper, we suggest a Dual Regularized Co-Clustering (DRCC) based on seminonnegative matrix tri-factorization method. To address this problem, in this proposed paper, we propose a Dual Regularized Co-Clustering (DRCC) method based on semi non-negative matrix trifactorization method, which inherits the pros of ONMT. We think that not only the data points but also the skins are discrete samplings method from some manifolds, namely data manifold and characteristic manifold respectively [21]. III.PROPOSED APPROACH The proposed approach employs game theory method, which lets to reorder the clustering gets with NMF, creating it reliable with the structure of the data set. With our advance, it inflicts that the cluster association has to be re-negotiated for all the objects. To this end, we use a dynamical system viewpoint, in which it is compulsory that similar objects have to fit in to similar clusters, so that the last clustering will be reliable with the structure of the data set. This viewpoint has demonstrated its efficiency in dissimilar semantic categorization situations, which engage a high numeral of interrelated categories and necessitate the employ of contextual and comparison in sequence. The Game Theoretic Nonnegative Matrix Factorization (GTNMF), our advance to NMF clustering refinement method. We remove the mark vectors of each object in a dataset then, depending on the NMF algorithm worn, it furnish as input to NMF the attribute vectors or a parallel matrix. GTNMF gets as input the matrix W obtained with NMF and the resemblance graph A of the dataset to create a reliable clustering of the data. These limits can be conquering using the relational in order of the data and the stage a consistent labelling. For this cause, in this proposed paper we employ a more powerful tool resulting from evolutionary game theory, which permits to reorder the clustering getter with NMF, making it reliable with the structure of the data set. With our advance, we inflict that the cluster membership has to be re-negotiated for all the objects. To this end, we use a dynamical system viewpoint, in which it is compulsory that alike objects have to belong to alike clusters, so that the final clustering will be reliable with the structure of the data set. This viewpoint has established its efficiency in dissimilar semantic classification scenarios, which engage a high figure of interrelated group and need the use of background and resemblance information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Projected Alternating Least square Approach for Computation of Nonnegative Matrix Factorization

Nonnegative matrix factorization (NMF) is a common method in data mining that have been used in different applications as a dimension reduction, classification or clustering method. Methods in alternating least square (ALS) approach usually used to solve this non-convex minimization problem.  At each step of ALS algorithms two convex least square problems should be solved, which causes high com...

متن کامل

NGTSOM: A Novel Data Clustering Algorithm Based on Game Theoretic and Self- Organizing Map

Identifying clusters is an important aspect of data analysis. This paper proposes a noveldata clustering algorithm to increase the clustering accuracy. A novel game theoretic self-organizingmap (NGTSOM ) and neural gas (NG) are used in combination with Competitive Hebbian Learning(CHL) to improve the quality of the map and provide a better vector quantization (VQ) for clusteringdata. Different ...

متن کامل

A Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization

This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...

متن کامل

A Modified Digital Image Watermarking Scheme Based on Nonnegative Matrix Factorization

This paper presents a modified digital image watermarking method based on nonnegative matrix factorization. Firstly, host image is factorized to the product of three nonnegative matrices. Then, the centric matrix is transferred to discrete cosine transform domain. Watermark is embedded in low frequency band of this matrix and next, the reverse of the transform is computed. Finally, watermarked ...

متن کامل

Symmetric Nonnegative Matrix Factorization for Graph Clustering

Nonnegative matrix factorization (NMF) provides a lower rank approximation of a nonnegative matrix, and has been successfully used as a clustering method. In this paper, we offer some conceptual understanding for the capabilities and shortcomings of NMF as a clustering method. Then, we propose Symmetric NMF (SymNMF) as a general framework for graph clustering, which inherits the advantages of N...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017